String

Strings are Python builtins datatype for handling text. They are immutable thus you can not add, remove or updated any character in the string. If you wish to perform these operations than you need to create a new string and assign the existing/new variable name to it.

String is a sequence of characters.

characters


In [ ]:

Escape Characters

an escape character is a character which invokes an alternative interpretation on subsequent characters in a character sequence. An escape character is a particular case of metacharacters.

Table: Escape Characters
Escape sequence Hex value in ASCII Character represented
\a 07 Alert (Beep, Bell) (added in C89)[1]
\b 08 Backspace
\f 0C Formfeed
\n 0A Newline (Line Feed); see notes below
\r 0D Carriage Return
\t 09 Horizontal Tab
\v 0B Vertical Tab
\ 5C Backslash
\' 27 Single quotation mark
\" 22 Double quotation mark
\? 3F Question mark (used to avoid trigraphs)
\nnnnote 1 any The byte whose numerical value is given by nnn interpreted as an octal number
\xhh… any The byte whose numerical value is given by hh… interpreted as a hexadecimal number
\enote 2 1B escape character (some character sets)
\Uhhhhhhhhnote 3 none Unicode code point where h is a hexadecimal digit
\uhhhhnote 4 none Unicode code point below 10000 hexadecimal

String Types

Strings can be classified in 2 categories.

  • Standard String: Standard string is one which executed the escape characters
  • Raw String: Raw Strings on the other hand handle escape characters as normal characters and do not process them

Standard String


In [ ]:


In [7]:
#### Standard String Examples: 
friend = 'Chandu\tNalluri' 
print(friend)


Chandu	Nalluri

In [12]:
manager_details = "# Roshan Musheer:\nExcellent Manager and human being."
print(manager_details)


# Roshan Musheer:
Excellent Manager and human being.
encoding and decoding

In [ ]:


In [ ]:


In [ ]:

Standard String


In [ ]:


In [ ]:
+ Raw String: `a = r'Roshan\tMusheer'` # Roshan\tMusheer
+ Unicode String: `u = u'Björk'`

Since Python 3, strings are by default unicode string.

The standard string can be converted to unicode by using the function unicode().

String can be initialized using:

  • With single or double quotes ('', "").
  • On several consecutive lines, provided that it's between three single or double quotes (''' ''', """ """).
  • Without expansion characters (example: s = r '\ n', where s will contain the characters \ and n).

In [7]:
a = r'Roshan\tMusheer'
print(a)


Roshan\tMusheer

In [1]:
path = "C:\new_data\technical_jargons"
print(path)
path = R"C:\new_data\technical_jargons"
print(path)


C:
ew_data	echnical_jargons
C:\new_data\technical_jargons

NOTE: both r and R work the same way


In [9]:
a = 'Roshan\tMusheer'
print(a)


Roshan	Musheer

String Operations:

Creation


In [16]:
s = 'Camel'
print(id(s))


140100897256984

Concatenation

String concatenation is a process of joining two or more strings into a single string. As we have already discussed that string is an immutable datatype thus we have to create a new string for concatenation, what that means is the original strings will still remain the same and new one will be created using the texts from the originals.

There are multiple ways in which we can achive the concatenation. The most common method of achiving the concatenation, is to use + operator.

Lets take an example, where we have three string's and lets try to concatenate them using it.


In [21]:
st_the = "The "
st_action = " ran away !!!"
st = st_the + s + st_action
print(st)
print(s)
print(st_the)
print(st_action)
print(id(st))

print(id(st_the))
print(id(s))
print(id(st_action))


The Camel ran away !!!
Camel
The 
 ran away !!!
140100897622392
140100897828624
140100897256984
140100897312688

In [3]:
print(dir(s))


['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

Interpolation

string interpolation (or variable interpolation, variable substitution, or variable expansion) is the process of evaluating a string literal containing one or more placeholders, yielding a result in which the placeholders are replaced with their corresponding values.


In [22]:
print( 'Size of %s => %d' % (s, len(s)))
print(dir(s))
print( 'Size of %s => %d' % (s, s.__len__()))

def size(strdata):
    c = 0
    for a in strdata:
        c+=1
    return c

print(size("Anshu"))


Size of Camel => 5
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
Size of Camel => 5
5

%-formatting

Str.format()

Template Strings

Literal String

It is the new Interpolation method as it is implemented in Python 3.6.


In [24]:
# name = 'World'
# program = 'Python'
# print(f'Hello {name}! This is {program}')


  File "<ipython-input-24-a7809a03148c>", line 3
    print(f'Hello {name}! This is {program}')
                                           ^
SyntaxError: invalid syntax

In [5]:
# String processed as a sequence
s = "Murthy "
for ch in s: print(ch , end=',') # This 
# print(help(print))
print("\b.")
print("~"*79)


M,u,r,t,h,y, ,.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In [6]:
# Strings are objects
if s.startswith('M'): print(s.upper())

print(s.lower())
print("~"*79)

# what will happen? 
print(3*s) 

# print(dir(s))


MURTHY 
murthy 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Murthy Murthy Murthy 

In [7]:
s = "   Murthy "
age = 5
print(s + str(age))
print(s.strip(), age)
# print(s + age)


   Murthy 5
Murthy 5

In [17]:
st = "    Mayank Johri    "
print(len(st))
s = st.strip()
print(len(s))
print(st.rstrip())
print(st.lstrip())


20
12
    Mayank Johri
Mayank Johri    

In [13]:
m = "Mohan Shah"
x = ["mon", "tues", "wed"]
y = ","
a = "On Leave"
print(y.join(x)) # -> mon,tues,wed
print(m.join(y)) 
print(a.join(y))
print(y.join(a)) 
print(a.join(m))


mon,tues,wed
,
,
O,n, ,L,e,a,v,e
MOn LeaveoOn LeavehOn LeaveaOn LeavenOn Leave On LeaveSOn LeavehOn LeaveaOn Leaveh

Create a string from a list of string items


In [14]:
" ".join(x)


Out[14]:
'mon tues wed'

In [15]:
book_desc = ["This", "book", "is good"]
" ".join(book_desc)


Out[15]:
'This book is good'

The operator % is used for string interpolation. The interpolation is more efficient in use of memory than the conventional concatenation.

Symbols used in the interpolation:

  • %s: string.
  • %d: integer.
  • %o: octal.
  • %x: hexacimal.
  • %f: real.
  • %e: real exponential.
  • %%: percent sign.

Symbols can be used to display numbers in various formats.

Example:


In [20]:
# Zeros left
print ('Now is %02d:%02d.' % (6, 30))

# Real (The number after the decimal point specifies how many decimal digits )
print ('Percent: %.1f%%, Exponencial:%.2e' % (5.333, 0.00314))

# Octal and hexadecimal
print ('Decimal: %d, Octal: %o, Hexadecimal: %x' % (10, 10, 10))


Now is 06:30.
Percent: 5.3%, Exponencial:3.14e-03
Decimal: 10, Octal: 12, Hexadecimal: a

format

In addition to interpolation operator %, the string method and function format() is available.

The function format() can be used only to format one piece of data each time.

Examples:


In [19]:
peoples = [('Mayank', 'friend', 'Manish'),
('Mayank', 'reportee', 'Roshan Musheer')]

# Parameters are identified by order
msg = '{0} is {1} of {2}'

for name, relationship, friend in peoples:
    print(msg.format(name, relationship, friend))


Mayank is friend of Manish
Mayank is reportee of Roshan Musheer

In [36]:
# Parameters are identified by name
msg = '{greeting}, it is {hour:02d}:{minute:02d}'

print(msg.format(greeting='Good Morning', hour=9, minute=30))
print(msg)
# Builtin function format()
print ('Pi =', format(3.14159, '.3e'))
print ('Pi =', format(3.14159, '.1e'))


Good Morning, it is 09:30
{greeting}, it is {hour:02d}:{minute:02d}
Pi = 3.142e+00
Pi = 3.1e+00

>>> TODO !!!

Explain the below examples


In [40]:
'{} {}'.format('सूर्य', 'नमस्कार')


Out[40]:
'सूर्य नमस्कार'

In [41]:
'{1} {0}'.format('सूर्य', 'नमस्कार')


Out[41]:
'नमस्कार सूर्य'

In [ ]:


In [ ]:


In [42]:
'{:>10}'.format('सूर्य नमस्कार')


Out[42]:
'      test'

In [44]:
'{:20}'.format('सूर्य नमस्कार')


Out[44]:
'सूर्य नमस्कार       '

In [49]:
'{:4}'.format('Bonjour')


Out[49]:
'Bonjour'

In [51]:
'{:_<5}'.format('Ja')


Out[51]:
'Ja___'

In [58]:
'{:^7}'.format('こんにちは')


Out[58]:
' こんにちは '

In [ ]:


In [55]:
'{:.5}'.format('Bonjour')


Out[55]:
'Bonjo'

In [102]:
'{:10.5}'.format('Bonjour')


Out[102]:
'Bonjo     '

In [106]:
'{:{align}{width}}'.format('Bonjour', align='^', width='9')


Out[106]:
' Bonjour '

In [ ]:


In [107]:
'{:.{prec}} = {:.{prec}f}'.format('Bonjour', 2.22, prec=4)


Out[107]:
'Bonj = 2.2200'

In [ ]:


In [66]:
'{:d}'.format(1980)


Out[66]:
'1980'

In [67]:
'{:f}'.format(3.141592653589793)


Out[67]:
'3.141593'

In [72]:
'{:4f}'.format(3.141592653589793)


Out[72]:
'3.141593'

In [77]:
'{:04d}'.format(119)


Out[77]:
'0119'

In [68]:
'{:06.2f}'.format(3.141592653589793)


Out[68]:
'003.14'

In [78]:
'{:+d}'.format(119)


Out[78]:
'+119'

In [79]:
'{:+d}'.format(-119)


Out[79]:
'-119'

In [86]:
### Need to find for complex & boolean numbers
## '{:+d+d}'.format(-3 + 2j)

In [89]:
'{:=5d}'.format((- 111))


Out[89]:
'- 111'

In [ ]:


In [90]:
'{: d}'.format(101)


Out[90]:
' 101'

In [ ]:


In [92]:
'{name} {surname}'.format(name='Mayank', surname='Johri')


Out[92]:
'Mayank Johri'

In [ ]:


In [95]:
user = dict(name='Mayank', surname='Johri')
'{u[name]} {u[surname]}'.format(u=user)


Out[95]:
'Mayank Johri'

In [ ]:


In [97]:
lst = list(range(10))
'{l[2]} {l[7]}'.format(l=lst)


Out[97]:
'2 7'

In [ ]:


In [100]:
from datetime import datetime
'{:%Y-%m-%d %H:%M}'.format(datetime(2017, 12, 23, 14, 15))


Out[100]:
'2017-12-23 14:15'

In [ ]:


In [ ]:


In [ ]:


In [31]:
class Yoga(object):

    def __repr__(self):
        return 'सूर्य नमस्कार'

In [35]:
'{0!r} <-> {0!a}'.format(Yoga())


Out[35]:
'सूर्य नमस्कार <-> \\u0938\\u0942\\u0930\\u094d\\u092f \\u0928\\u092e\\u0938\\u094d\\u0915\\u093e\\u0930'

In [ ]:


In [ ]:

str in-build module

Strings implement all of the common sequence operations, along with the additional methods described below.


In [3]:
myStr = "maya Deploy, version: 0.0.3 "

print(myStr.capitalize())
print(myStr.center(60))
print(myStr.center(60, "*"))
print(myStr.center(10, "*"))

print(myStr.count('a'))
print(myStr.count('e'))

print(myStr.endswith('all'))

print(myStr.endswith('.0.3'))
print(myStr.endswith('.0.3 '))

print(myStr.find("g"))
print(myStr.find("e"))


Maya deploy, version: 0.0.3 
                maya Deploy, version: 0.0.3                 
****************maya Deploy, version: 0.0.3 ****************
maya Deploy, version: 0.0.3 
2
2
False
False
True
-1
6

Note: The find() method should be used only if you need to know the position of sub. To check if sub is a substring or not, use the in operator:

checking: substring in main_string : returns true or false


In [6]:
print("m" in myStr)


True

In [5]:
print("M" in myStr)


False

In [34]:
c = "one"
print(c.isalpha())
c = "1"
print(c.isalpha())


True
False

In [39]:
superscripts = "\u00B2"
five = "\u0A6B"
#str.isdecimal() (Only Decimal Numbers)
print(five)
print(c.isdecimal())
print(five.isdecimal())
print("10 ->", "10".isdecimal())
print("10.001".isdecimal())

str = u"this 2009";  
print(str.isdecimal())

str = u"23443434";
print(str.isdecimal())
print(fractions.isdecimal())


੫
True
True
10 -> True
False
False
True
False

In [42]:
# str.isdigit() (Decimals, Subscripts, Superscripts)
fractions = "\u00BC"
print(fractions)
print(c.isdigit())
print(fractions.isdigit())
print(five.isdigit())

print("10".isdigit())
str = u"this 2009";  
print(str.isdigit())

str = u"23443.434";
print(str.isdigit())


¼
True
False
True
True
False
False

In [29]:
print(superscripts)
print(superscripts.isdigit())
print(superscripts.isdecimal()) 
print(superscripts+superscripts)
print(fractions+fractions)


²
True
False
²²
¼¼

In [30]:
# str.isnumeric() (Digits, Fractions, Subscripts, Superscripts, Roman Numerals, Currency Numerators)
print(fractions)
print(fractions.isnumeric())
print(five.isnumeric())


¼
True
True

In [ ]:


In [31]:
print(myStr.isalnum())
print("one".isalnum())
print("thirteen".isalnum())


False
True
True

String Module


Various functions for dealing with text are implemented in the module string.


In [14]:
import string

# the alphabet
print(dir(string))
a = string.ascii_letters
print(a)
# Shifting left the alphabet
b = a[1:] + a[0]
print(b)
print(b.__doc__)
print(string.digits)
print(string.hexdigits)
print(help(string.printable))


['ChainMap', 'Formatter', 'Template', '_TemplateMetaclass', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_re', '_string', 'ascii_letters', 'ascii_lowercase', 'ascii_uppercase', 'capwords', 'digits', 'hexdigits', 'octdigits', 'printable', 'punctuation', 'whitespace']
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
bcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZa
str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or
errors is specified, then the object must expose a data buffer
that will be decoded using the given encoding and error handler.
Otherwise, returns the result of object.__str__() (if defined)
or repr(object).
encoding defaults to sys.getdefaultencoding().
errors defaults to 'strict'.
0123456789
0123456789abcdefABCDEF
no Python documentation found for '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

None

Template

The module also implements a type called Template, which is a model string that can be filled through a dictionary. Identifiers are initialized by a dollar sign ($) and may be surrounded by curly braces, to avoid confusion.

Example:


In [3]:
import string

# Creates a template string
st = string.Template('$warning occurred in $when $$what')

# Fills the model with a dictionary
s = st.substitute({'warning': 'Lack of electricity',
    'when': 'April 3, 2002'})

# Shows:
# Lack of electricity occurred in April 3, 2002
print(s)


Lack of electricity occurred in April 3, 2002 $what

In [1]:
# Unicode String 
u = u'Hüsker Dü'
# Convert to str
s = u.encode('latin1')
print (s, '=>', type(s))

# String str
s = 'Hüsker Dü'
# u = s.decode('latin1')

print (repr(u), '=>', type(u))


b'H\xfcsker D\xfc' => <class 'bytes'>
'Hüsker Dü' => <class 'str'>

To use both methods, it is necessary to pass as an argument the compliant coding. The most used are "latin1" "utf8".

References